AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal Input

# Multimodal Input

Gemma 3 12b Pt Qat Q4 0 Gguf
Gemma 3 is a lightweight open-source multimodal model from Google, supporting text and image input with text output, featuring a 128K ultra-long context window and support for 140+ languages.
Image-to-Text
G
google
475
12
3dtopia XL
Apache-2.0
3DTopia-XL is a diffusion Transformer architecture based on PrimX efficient 3D representation, capable of rapidly generating high-quality 3D assets
3D Vision
3
FrozenBurning
129
45
Sam2 Hiera Base Plus
Apache-2.0
SAM 2 is a foundational model for promptable visual segmentation in images and videos developed by FAIR, supporting efficient segmentation through prompts.
Image Segmentation
S
facebook
18.17k
6
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase